What We'll Cover
Every time you send a message to an AI assistant, something physical happens: electricity flows through data-centre hardware on the other side of the world. This session puts numbers on that — not to make you feel guilty about using AI, but to help you think clearly about its environmental implications as a researcher and as a citizen.
We will look at energy consumption from the level of a single query up to the scale of global AI usage, at water — an environmental cost that receives less attention than carbon — and at why the numbers you read in the media should always be treated with careful scepticism.
This is a topic where the data are genuinely uncertain and where companies have strong incentives to present their products in the best possible light. Part of our job here is to understand why the numbers are uncertain, not just what they are.
⚠️ A Note on Numbers Throughout This Week
Almost all figures in this area come from one of three sources: company self-reports, independent academic estimates, or media extrapolations from those estimates. These often disagree substantially. Throughout these lessons we will flag where figures come from and what assumptions they rest on. Treat any specific number as an order-of-magnitude guide rather than a precise measurement.
🔍 The Transparency Problem
Before we look at any numbers, it is worth understanding why getting accurate figures is so difficult — because the answer shapes how we interpret everything that follows.
What Companies Don't Disclose
Major AI developers — OpenAI, Google, Meta, Microsoft, Anthropic — do not publish detailed energy or emissions data for individual models. What is typically missing:
- Energy consumed per query (by model type or size)
- Data centre locations and their grid carbon intensity
- Water consumption at specific facilities
- Hardware lifecycle data (manufacturing emissions)
- Total annual compute for model training
Some companies publish aggregate sustainability reports, but these cover their entire operations, not AI specifically, and rely on accounting methods that can obscure more than they reveal.
What Researchers Can Estimate
In the absence of direct data, researchers use indirect methods to estimate AI's environmental footprint:
- Hardware benchmarks: Measure energy use of known GPUs, estimate utilisation rates
- Open-source proxies: Run open-weight models locally and measure directly
- Architectural inference: Use known model sizes and FLOPs estimates to project energy
- Company disclosures: Use reported figures with appropriate scepticism about representativeness
This is why published estimates for the same model can differ by a factor of 10 or more.
📄 Key Reading: The Hidden Costs of AI
Nature (2024): "Generative AI’s environmental costs are soaring — and mostly secret" — Nature's news team interviews researchers and industry figures on the difficulty of getting accurate data from companies. A good starting point for understanding the transparency problem.
⚡ Energy per Interaction
Let's start at the level of a single query. How much electricity does it take to generate one response?
📊 Energy Consumption by Task Type
The table below combines company-reported figures and independent measurements from MIT Technology Review (2025). Note the wide ranges — these reflect genuine differences in model size, hardware, and methodology, not just uncertainty.
| Task | Energy Estimate | Everyday Comparison | Source / Confidence |
|---|---|---|---|
| Google Search query | ~0.3 Wh | ~11 seconds of a light bulb (LED) | Google; moderate confidence |
| ChatGPT text prompt (reported) | 0.34 Wh | ~13 seconds of a light bulb | OpenAI blog, June 2025; likely best-case |
| Gemini text prompt (reported) | 0.24 Wh | ~9 seconds of TV | Google White Paper, 2025; likely best-case |
| Large model text prompt (independent) | 0.1–8 Wh | 4 seconds – ~5 minutes of a microwave | MIT Technology Review, 2025; measured on open-weight models |
| AI image generation | ~2–5 Wh | ~1–3 minutes of a microwave | Various; moderate confidence |
| AI video generation (5 seconds) | ~3.4 MJ (~944 Wh) | Approximately 1 hour of microwave use | MIT Technology Review, 2025; ~700× image cost |
Key observation: Text generation and video generation are not in the same ballpark — they differ by roughly three orders of magnitude. Company-reported figures for text prompts are at the lower end of independent measurements, likely reflecting optimised infrastructure not available to all deployments.
📹 Video Generation: A Different Category
The energy cost of generating a 5-second AI video (~944 Wh, per MIT Technology Review 2025) is not a typing error. High-resolution video generation involves running a very large diffusion model many times over — the equivalent of generating hundreds or thousands of images sequentially and combining them. This figure comes from measuring open-source video generation models; proprietary models like Sora may differ.
For context: 944 Wh is roughly one-seventh of what an average South African household uses in a day (around 6–7 kWh) — a meaningful amount for a few seconds of video, but still a fraction of daily household use.
🤖 The Agentic Multiplier: Tools Like Claude Code
The figures above describe a single prompt: you ask, the model answers once. But the way many researchers now use AI is very different. Agentic tools — Claude Code, Codex, Cursor's agent mode, Gemini CLI — do not answer once. They run a loop: read files, call tools, run code, read the output, reason about it, and call the model again, often dozens of times to complete one task. Energy use scales roughly with the total number of tokens the model processes, so an agentic task can cost many times more than a single chat prompt. This section tries to put bounds on "how much more" — and, as with everything in this lesson, the honest answer is a wide range.
⚠️ This is an estimate built on other estimates
Every figure in this section is derived by chaining together numbers from earlier in the lesson with practitioner reports and vendor documentation. None of it is a direct measurement of a research workflow. We present each input with its source and give a central estimate with lower and upper bounds so you can see exactly where the uncertainty enters. Treat the final numbers as order-of-magnitude guides, not precise figures.
🎚️ What "effort" Means in Claude Code
Claude's API exposes an effort setting that controls how many tokens the model is willing to spend on a task. Anthropic's documentation describes it as "a behavioral signal, not a strict token budget" — so it does not map onto a fixed energy figure, but it does directly change token use, and therefore energy. Crucially, effort affects all tokens in a response, including tool calls: lower effort means the model makes fewer tool calls and writes less; higher effort means more exploration, more tool calls, and deeper reasoning.
| Effort level | What Anthropic's docs say it is for | Illustrative energy multiplier (relative to high = 1×) |
|---|---|---|
| low | Simplest tasks, fastest, lowest cost; "scopes its work to what was asked" | ~0.3–0.5× |
| medium | Balanced; the "drop-in for the average workflow" where you want to reduce cost | ~0.6–0.8× |
| high (API default) | Complex reasoning and agentic tasks; the "sweet spot" balancing quality and tokens | 1× (reference) |
| xhigh (Opus 4.7) | Long-running agentic/coding tasks over 30 minutes; "token budgets in the millions"; "meaningfully higher token usage than high" | ~2–5× |
| max | "No constraints on token spending"; reserve for frontier problems — "significant cost for relatively small quality gains" and can "overthink" | ~3–10× |
Important: The "multiplier" column is our own illustrative estimate, inferred from the qualitative descriptions in Anthropic's documentation. Anthropic does not publish numeric energy or token multipliers for the effort levels. The short quoted phrases describing each level come directly from Anthropic's Effort documentation; the numbers attached to them do not. We include the multipliers only to make the rest of the calculation concrete — they could easily be off by a factor of two or more.
🔢 The Calculation Ledger: Every Input and Its Source
To estimate the footprint of a working day with an agentic tool, we need three quantities, each with a central value, a range, and a source. (How many calls make up a working day is a separate scenario assumption, stated below the table.)
| Input | Lower | Central | Upper | Source & reasoning |
|---|---|---|---|---|
| A. Energy per model call | ~1 Wh | ~3 Wh | ~8 Wh | This lesson's table: 0.34 Wh (OpenAI, reported best-case) up to 8 Wh (MIT Tech Review 2025, large open-weight model). Agentic calls carry long contexts (file contents, tool outputs, accumulated reasoning), so they sit toward the upper end — we centre on ~3 Wh. |
| B. Compute per task (simple-prompt-equivalents) |
~10× | ~30× | ~100× | An agentic task is many model calls on large contexts. Analyses converge on roughly 5–30× the tokens of a single chat interaction for typical agents, rising to 100× or more for complex coding workflows (the Stanford Digital Economy Lab; SWE-bench-style coding tasks average 1–3.5M tokens/task). We take 10–100× as a central band — possibly conservative at the top. Anthropic's docs confirm the direction ("token budgets in the millions" at xhigh) but give no number. |
| C. Grid carbon intensity | 0.386 kg CO₂/kWh (US grid average) | ~0.9 kg CO₂/kWh (South Africa) | US figure as used elsewhere in this lesson; SA figure from the 2022 Grid Emission Factors Report (~0.87–1.01 kg/kWh depending on methodology — SA's grid is ~80% coal). | ||
📊 Putting the Inputs Together: A Day on high Effort
Multiplying A × B gives the energy of one task. For a working day we assume ~50 substantial, compute- and loop-intensive calls — not quick one-shot lookups, which cost far less. We hold the per-call energy (A) near its ~3 Wh midpoint and let the per-task multiplier (B) drive the range; applying the grid carbon intensity (C) then converts energy to CO₂. (Letting A vary across its full 1–8 Wh band as well would widen these numbers further in both directions.)
| Quantity | Lower | Central estimate | Upper |
|---|---|---|---|
| Energy per agentic task (A × B) | ~30 Wh | ~100 Wh | ~300 Wh |
| Energy for a 50-task day (× 50) | ~1.5 kWh | ~5 kWh | ~15 kWh |
| CO₂ — US grid (× 0.386) | ~0.6 kg | ~1.9 kg | ~5.8 kg |
| CO₂ — South African grid (× 0.9) | ~1.4 kg | ~4.5 kg | ~13.5 kg |
For perspective, the central ~5 kWh for a full day of agentic coding is roughly five times the 944 Wh of a single 5-second AI video, and more than ten thousand times a single reported text prompt — but it is still a modest fraction of a household's daily electricity use. The footprint of this kind of AI use comes from the loop, not from any one prompt.
🚗 How Does This Compare to Driving a Car?
A "typical passenger vehicle" emits about 400 g CO₂ per mile ≈ 0.25 kg CO₂ per km (US EPA). Dividing the day's emissions by that figure gives an equivalent driving distance:
- On the US grid: a full day ≈ ~8 km of driving (range ~2–23 km)
- On South Africa's coal-heavy grid: a full day ≈ ~18 km of driving (range ~5–54 km)
In other words, a researcher's central-estimate day on high effort is comparable to a short commute. (Newer petrol cars emit closer to 0.12–0.17 kg/km, which would roughly double these equivalent distances — the car comparison is itself a range.)
⚠️ The Ceiling: xhigh and max Effort
The day above assumes high effort. Pushing every task to xhigh or max — which Anthropic reserves for "genuinely frontier problems" and warns adds "significant cost for relatively small quality gains" — could multiply token use a further ~3–10× (our illustrative figure). Applied to the central ~5 kWh day, that puts a heavy max-effort day in the region of 15–50 kWh, or roughly tens to over 150 km of equivalent driving on the SA grid. But note that effort is a ceiling, not a floor: even on max, a simple request still resolves cheaply, because the model spends tokens roughly in proportion to a task's difficulty. These figures assume 50 genuinely compute- and loop-intensive calls — a day padded with quick lookups would land well below them. The practical lesson is the same one Anthropic's own guidance gives: do not run at maximum effort by default. Match the effort to the task, both for cost and for footprint.
💡 What This Means for You as a Researcher
The per-prompt footprint of AI is genuinely small. The footprint of agentic AI is larger — not because any single step is expensive, but because there are so many steps. The same property that makes these tools powerful for research (they keep working autonomously) is what drives their energy use.
This connects directly to the rebound problem we examine in the next session: as agentic tools make each task cheaper and easier, we tend to run far more of them. The individual cost falls; the total can still rise. The responsible move is not to avoid these tools, but to use the right effort level for the job, and to be honest — as we have tried to be here — about how uncertain the numbers really are.
✈️ A Worked Example: The Flight Comparison
A common comparison in media coverage: how does using AI compare to taking a transatlantic flight? Let's work through this carefully — because how you do the calculation matters enormously.
🔢 Step-by-Step Calculation
Reference point: one economy-class transatlantic flight (London → New York, ~5,540 km)
- CO₂ per passenger (direct emissions only): ~0.5 tonnes
- Including radiative forcing at altitude (contrails, water vapour): roughly doubles the climate impact
- Commonly used estimate: ~1 tonne CO₂e per passenger
- Note: figures from different calculators range from 0.5 to 1.5 tonnes — this is itself uncertain
Carbon per ChatGPT text prompt:
| Energy assumption | Source | CO₂ per prompt (US grid avg: 0.386 kg/kWh) |
Calls to match 1-tonne flight |
|---|---|---|---|
| 0.34 Wh | OpenAI (reported, 2025) | 0.13 g CO₂ | ~7.6 million |
| 2 Wh | Mid-range independent estimate | 0.77 g CO₂ | ~1.3 million |
| 8 Wh | Large model, independent measurement | 3.1 g CO₂ | ~323,000 |
What this tells us: Depending entirely on which figures you use, one transatlantic flight equals somewhere between 300,000 and 7.6 million ChatGPT text queries. This is not a rounding error — it reflects the difference between company-optimised infrastructure and real-world large-model deployments, as well as genuine uncertainty about what "a ChatGPT query" even means in terms of model size and compute.
💡 Why the Range Is the Lesson
The fact that this calculation can produce answers spanning two orders of magnitude is not a failure of the analysis — it is the most important finding. It tells us that glib comparisons ("ChatGPT = X flights per day") depend almost entirely on unstated assumptions, and that the opacity of AI companies makes it impossible to pin down a single honest answer.
As researchers, the appropriate response is not to pick the number that fits our preferred narrative, but to present the range honestly and to push for better disclosure.
📄 Source for the energy figures above
MIT Technology Review (2025): "We did the math on AI’s energy footprint. Here’s the story you haven’t heard." — independent measurements of open-source models providing a useful counterpoint to company-reported figures.
Luccioni et al. (2023): "Power Hungry Processing: Watts Driving the Cost of AI Deployment?" — benchmarks inference energy across a wide range of NLP tasks and model types.
🌍 Zooming Out: AI’s Share of Global Emissions
The per-query comparison above is useful for personal calibration — but it can’t tell you what share of global emissions AI actually represents. For that, you have to layer estimates from primary sources and be honest about what is measured versus modelled.
📊 A back-of-envelope: AI vs aviation, with sources
Step 1 — Inputs (primary sources):
- Data centres ≈ 1.5% of global electricity (≈ 415 TWh in 2024) — IEA, Energy and AI (2025).
- AI-focused share ≈ 0.5% of global electricity (2025) — Hannah Ritchie’s estimate from the IEA inputs, Sustainability by Numbers (2026).
- Electricity & heat ≈ 40% of energy-related CO2; energy-related CO2 ≈ 85–90% of total CO2 — IEA / Our World in Data sectoral breakdown.
Step 2 — Calculation: If AI’s electricity were at average grid carbon intensity, AI’s share of energy-related CO2 ≈ 0.5% × 40% ≈ 0.2%. AI loads are concentrated in the US (gas-heavy) and China (coal-heavy), pushing intensity above average; the largest operators’ renewable and nuclear procurement pulls below it (though annual matching overstates hourly cleanliness). Net plausible range: 0.2–0.4% of energy-related CO2, or roughly 0.15–0.35% of total CO2 / GHG depending on denominator.
Step 3 — The aviation benchmark. Aviation accounted for 2.5% of global energy-related CO2 in 2023 (≈ 950 Mt) — IEA, Aviation. Passenger flights are ≈ 81% of that and freight ≈ 19%, so passenger aviation ≈ 2% of global CO2.
Headline: Passenger air travel emits roughly 6–10× what AI does operationally right now. AI is around a tenth to a fifth of aviation’s footprint at the system level.
Two caveats sharpen the comparison further:
- Aviation’s real impact is larger than the CO2 alone. Non-CO2 effects — contrails and NOx at altitude — push aviation’s share of anthropogenic warming closer to 3.5–4% (Lee et al. 2021, Atmospheric Environment). The gap on a warming basis is wider than the gap on a CO2 basis.
- Aviation is concentrated; AI use is broadly distributed. Only ≈ 11% of the world’s population flew in 2018, and the most frequent ≈ 1% account for more than half of passenger-flight emissions (Gössling & Humpe 2020, Global Environmental Change). AI use is becoming more broadly distributed, which makes the per-person picture genuinely different from the per-person flight picture even when the system-level numbers sit in the same neighbourhood.
And the ratio is narrowing. The IEA projects data-centre electricity roughly doubling from ≈ 485 TWh (2025) to ≈ 950 TWh (2030, ≈ 3% of global electricity), with the AI-focused slice tripling — moving AI from ≈ one-third toward ≈ half of data-centre load. Even so, the IEA projects all data-centre CO2 at only ≈ 1% of energy-related CO2 by 2030. AI would need several-fold growth relative to aviation to close the gap.
⚠️ How to read these numbers
These are layered estimates, not audited measurements. The 0.2–0.4% calculation rests on three assumptions, each with its own uncertainty: (a) the IEA’s 1.5% data-centre figure, (b) Hannah Ritchie’s one-third allocation to AI specifically, and (c) average grid intensity. Treat the headline as a well-reasoned estimate from primary sources, not as a measurement — and the same scepticism the rest of this lesson asks you to apply to vendor claims applies to careful third-party estimates too.
The figures also exclude embodied emissions — chip fabrication, server manufacture, data-centre construction (concrete, steel). 3.3 returns to those in detail, because lifecycle accounting can raise the operational number materially. And the definition of “AI” versus general accelerated compute is itself fuzzy: not every workload on a GPU is what most people mean by AI.
💧 The Water Footprint
Energy gets most of the attention, but water is the environmental cost of AI that is least reported and least understood. Data centres are thirsty in two distinct ways.
Direct Water Use: Cooling
Modern data centres generate enormous amounts of heat. The most common solution is evaporative cooling — running water over heat exchangers where it evaporates, carrying heat away.
- Why water, not air? Evaporative cooling is far more energy-efficient than air cooling for the temperatures involved
- Why filtered municipal water? Minerals in unfiltered water would corrode and clog sensitive hardware — most large data centres use high-quality drinking water
- Estimate: A 100 MW data centre may consume the equivalent of ~2,600 households' daily water use (IEA estimate)
- US total: Data centres directly consume roughly 17.5 billion gallons of water per year — about 0.3% of the US public water supply (Lawrence Berkeley National Laboratory, 2024)
Indirect Water Use: Electricity
Generating electricity also consumes water — at the power plant, not the data centre. This "indirect" water use is often larger than the direct cooling water.
- Thermal power plants (coal, gas, nuclear) use water for steam generation and cooling
- This water is often not returned to its source — it is evaporated or discharged at a different temperature
- A commonly cited estimate (Shaolei Ren, UC Riverside): a ~30-turn ChatGPT conversation = roughly 500 ml of water, of which only 12–13% is direct cooling water; the rest is from electricity generation
- Geographic sensitivity: Water extracted in arid regions has very different consequences from the same volume extracted in water-abundant areas
⚠️ Treating the 500 ml figure with care
The "500 ml per 30-turn conversation" figure (from Li et al., 2023 / Shaolei Ren) became widely cited in media coverage, often without key caveats: it is an estimate based on assumed data centre locations (US-average), a specific model generation (GPT-3 era), and indirect water attribution methods that are contested. Modern data centres vary widely in water efficiency — some use closed-loop systems that recirculate water rather than evaporating it. Treat this as an order-of-magnitude estimate, not a precise measurement. The original paper acknowledges significant uncertainty.
📄 Key Reading: AI Water Footprint
Li, Yang, Islam, Ren (2023): "Making AI Less Thirsty" — the most-cited quantitative analysis of AI water consumption. Read the paper, not just the media coverage of it.
📊 Putting It at Scale
Individual query costs only matter if we multiply them by usage. Let's consider what the aggregate picture looks like.
🌍 AI Usage at Global Scale (2025)
| Metric | Figure | Source / Note |
|---|---|---|
| ChatGPT queries per day | ~2.5 billion | OpenAI (reported to Axios, 2025) |
| Organisations using AI | 78% of surveyed organisations | Stanford AI Index, 2025 |
| US data centre electricity (2023) | ~100 TWh/year | Lawrence Berkeley National Laboratory, 2024 (tripled since 2014) |
| Projected AI data centre electricity (2028) | 250–400 TWh/year | Various analysts; high uncertainty |
| US household electricity reference | ~1,200 TWh/year total | US EIA; for comparison purposes |
If AI data centres reach 400 TWh by 2028, that would be roughly equivalent to one-third of all US household electricity consumption. These projections carry very high uncertainty — they depend on assumptions about AI adoption rates, hardware efficiency improvements, and grid composition.
💡 Training vs. Inference: Where the Energy Goes
Much early coverage of AI's environmental cost focused on the energy required to train a model — the one-time compute cost of creating GPT-4 or Claude. But for widely-deployed models, inference — running the model to answer queries — can easily exceed training energy over the model's lifetime.
With 2.5 billion queries per day, the cumulative inference cost of ChatGPT dwarfs the one-time training cost within months of deployment. This is why statements like "training GPT-3 emitted X tonnes of CO₂" need to be placed in the context of ongoing inference costs.
🔬 Try this yourself (about 10 minutes)
This page argues for calibrated numbers over both panic and dismissal. Here is a quick way to feel both the estimate and the opacity for yourself, in two short steps.
- ▸Put a number on something real. Take one computation or AI task you actually ran recently and estimate its footprint — either with this lesson's per-interaction figures, or with a free calculator such as Green Algorithms (we return to these tools in 3.4). Notice how much the answer depends on assumptions you had to choose.
- ▸Now hit the transparency wall. Pick one AI assistant you use and try to find its official per-query energy or water figure — a number from the company you could actually cite. Spend five minutes and then stop. The difficulty you just experienced is the “transparency problem” this page opened with: the figures you can cite are mostly third-party estimates, not disclosed measurements.
The point is not a precise total. It is to come away holding a defensible order-of-magnitude and an honest sense of how much the vendors are not telling you.
📚 Summary & Key Takeaways
Before we can have a productive conversation about AI's environmental impact, we need to understand the numbers — and their limits:
- Corporate opacity is the fundamental problem: AI companies don't disclose the data needed to calculate their environmental footprint, so all estimates involve assumptions
- Text vs. video is not comparable: Video generation uses roughly 1,000× more energy than text generation — these are different categories of use
- The flight comparison spans orders of magnitude: 300,000 to 7.6 million queries per flight, depending on model size and whose figures you trust
- Water is underreported: Both direct cooling and indirect electricity generation consume significant water, with geographic implications
- Scale is what matters: Individual query costs are small; billions of queries per day is not
- Inference, not just training: The ongoing cost of running models at scale quickly exceeds one-time training costs
- Agentic tools cost more than single prompts: A day of work with a tool like Claude Code (~50 tasks on
higheffort) is on the order of ~5 kWh — central estimate ~1.9 kg CO₂ (US grid) / ~4.5 kg (SA grid), comparable to driving ~8–18 km. The cost comes from the loop of many calls, not any one prompt, and pushing tomaxeffort can multiply it several-fold
Next session (Week 3.2): We zoom out from individual queries to the infrastructure level — where does the electricity come from, how does manufacturing hardware fit in, and why do efficiency gains so often fail to reduce total energy use?